Selection of clusters number and features subset during a two-levels clustering task

نویسندگان

Sébastien Guérif

Younès Bennani

چکیده

Simultaneous selection of the number of clusters and of a relevant subset of features is part of data mining challenges. A new approach is proposed to address this difficult issue. It takes benefits of both two-levels clustering approaches and wrapper features selection algorithms. On the one hands, the former enhances the robustness to outliers and to reduce the running time of the algorithm. On the other hands, wrapper features selection (FS) approaches are known to given better results than filter FS methods because the algorithm that uses the data is taken into account. First, a Self-Organizing Maps (SOM), trained using the original data sets, is clustered using k-means and the Davies-Bouldin index to determinate the best number of a clusters. Then, an individual pertinence measure guides the backward elimination procedure and the feature mutual pertinence is measure using a collective pertinence based on the quality of the clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection

Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...

متن کامل

The ensemble clustering with maximize diversity using evolutionary optimization algorithms

Data clustering is one of the main steps in data mining, which is responsible for exploring hidden patterns in non-tagged data. Due to the complexity of the problem and the weakness of the basic clustering methods, most studies today are guided by clustering ensemble methods. Diversity in primary results is one of the most important factors that can affect the quality of the final results. Also...

متن کامل

Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach

Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...

متن کامل

Using Particle Swarm Optimisation and the Silhouette Metric to Estimate the Number of Clusters, Select Features, and Perform Clustering

Abstract. One of the most difficult problems in clustering, the task of grouping similar instances in a dataset, is automatically determining the number of clusters that should be created. When a dataset has a large number of attributes (features), this task becomes even more difficult due to the relationship between the number of features and the number of clusters produced. One method of addr...

متن کامل

Feature Selection for Clustering

Clustering is an important data mining task Data mining often concerns large and high dimensional data but unfortunately most of the clustering algorithms in the literature are sensitive to largeness or high dimensionality or both Di erent features a ect clusters di erently some are important for clusters while others may hinder the clustering task An e cient way of handling it is by selecting ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Selection of clusters number and features subset during a two-levels clustering task

نویسندگان

چکیده

منابع مشابه

MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection

The ensemble clustering with maximize diversity using evolutionary optimization algorithms

Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach

Using Particle Swarm Optimisation and the Silhouette Metric to Estimate the Number of Clusters, Select Features, and Perform Clustering

Feature Selection for Clustering

عنوان ژورنال:

اشتراک گذاری